interaction policy
KIX: A Metacognitive Generalization Framework
Humans and other animals aptly exhibit general intelligence behaviors in solving a variety of tasks with flexibility and ability to adapt to novel situations by reusing and applying high level knowledge acquired over time. But artificial agents are more of a specialist, lacking such generalist behaviors. Artificial agents will require understanding and exploiting critical structured knowledge representations. We present a metacognitive generalization framework, Knowledge-Interaction-eXecution (KIX), and argue that interactions with objects leveraging type space facilitate the learning of transferable interaction concepts and generalization. It is a natural way of integrating knowledge into reinforcement learning and promising to act as an enabler for autonomous and generalist behaviors in artificial intelligence systems.
Deconstructing Cooperation and Ostracism via Multi-Agent Reinforcement Learning
Ueshima, Atsushi, Omidshafiei, Shayegan, Shirado, Hirokazu
Cooperation is challenging in biological systems, human societies, and multi-agent systems in general. While a group can benefit when everyone cooperates, it is tempting for each agent to act selfishly instead. Prior human studies show that people can overcome such social dilemmas while choosing interaction partners, i.e., strategic network rewiring. However, little is known about how agents, including humans, can learn about cooperation from strategic rewiring and vice versa. Here, we perform multi-agent reinforcement learning simulations in which two agents play the Prisoner's Dilemma game iteratively. Each agent has two policies: one controls whether to cooperate or defect; the other controls whether to rewire connections with another agent. This setting enables us to disentangle complex causal dynamics between cooperation and network rewiring. We find that network rewiring facilitates mutual cooperation even when one agent always offers cooperation, which is vulnerable to free-riding. We then confirm that the network-rewiring effect is exerted through agents' learning of ostracism, that is, connecting to cooperators and disconnecting from defectors. However, we also find that ostracism alone is not sufficient to make cooperation emerge. Instead, ostracism emerges from the learning of cooperation, and existing cooperation is subsequently reinforced due to the presence of ostracism. Our findings provide insights into the conditions and mechanisms necessary for the emergence of cooperation with network rewiring.
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
- South America > Brazil > São Paulo (0.04)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- (4 more...)
Physically Plausible Full-Body Hand-Object Interaction Synthesis
Braun, Jona, Christen, Sammy, Kocabas, Muhammed, Aksan, Emre, Hilliges, Otmar
We propose a physics-based method for synthesizing dexterous hand-object interactions in a full-body setting. While recent advancements have addressed specific facets of human-object interactions, a comprehensive physics-based approach remains a challenge. Existing methods often focus on isolated segments of the interaction process and rely on data-driven techniques that may result in artifacts. In contrast, our proposed method embraces reinforcement learning (RL) and physics simulation to mitigate the limitations of data-driven approaches. Through a hierarchical framework, we first learn skill priors for both body and hand movements in a decoupled setting. The generic skill priors learn to decode a latent skill embedding into the motion of the underlying part. A high-level policy then controls hand-object interactions in these pretrained latent spaces, guided by task objectives of grasping and 3D target trajectory following. It is trained using a novel reward function that combines an adversarial style term with a task reward, encouraging natural motions while fulfilling the task incentives. Our method successfully accomplishes the complete interaction task, from approaching an object to grasping and subsequent manipulation. We compare our approach against kinematics-based baselines and show that it leads to more physically plausible motions.
- North America > United States (0.04)
- Europe > Switzerland > Zürich > Zürich (0.04)
- Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
- (2 more...)
Multi-Level Compositional Reasoning for Interactive Instruction Following
Bhambri, Suvaansh, Kim, Byeonghwi, Choi, Jonghyun
Robotic agents performing domestic chores by natural language directives are required to master the complex job of navigating environment and interacting with objects in the environments. The tasks given to the agents are often composite thus are challenging as completing them require to reason about multiple subtasks, e.g., bring a cup of coffee. To address the challenge, we propose to divide and conquer it by breaking the task into multiple subgoals and attend to them individually for better navigation and interaction. We call it Multi-level Compositional Reasoning Agent (MCR-Agent). Specifically, we learn a three-level action policy. At the highest level, we infer a sequence of human-interpretable subgoals to be executed based on language instructions by a high-level policy composition controller. At the middle level, we discriminatively control the agent's navigation by a master policy by alternating between a navigation policy and various independent interaction policies. Finally, at the lowest level, we infer manipulation actions with the corresponding object masks using the appropriate interaction policy. Our approach not only generates human interpretable subgoals but also achieves 2.03% absolute gain to comparable state of the arts in the efficiency metric (PLWSR in unseen set) without using rule-based planning or a semantic spatial memory.
BusyBot: Learning to Interact, Reason, and Plan in a BusyBoard Environment
Liu, Zeyi, Xu, Zhenjia, Song, Shuran
We introduce BusyBoard, a toy-inspired robot learning environment that leverages a diverse set of articulated objects and inter-object functional relations to provide rich visual feedback for robot interactions. Based on this environment, we introduce a learning framework, BusyBot, which allows an agent to jointly acquire three fundamental capabilities (interaction, reasoning, and planning) in an integrated and self-supervised manner. With the rich sensory feedback provided by BusyBoard, BusyBot first learns a policy to efficiently interact with the environment; then with data collected using the policy, BusyBot reasons the inter-object functional relations through a causal discovery network; and finally by combining the learned interaction policy and relation reasoning skill, the agent is able to perform goal-conditioned manipulation tasks. We evaluate BusyBot in both simulated and real-world environments, and validate its generalizability to unseen objects and relations. Video is available at https://youtu.be/EJ98xBJZ9ek.
- Oceania > New Zealand > North Island > Auckland Region > Auckland (0.04)
- North America > United States > New York > New York County > New York City (0.04)
Learning Latent Traits for Simulated Cooperative Driving Tasks
DeCastro, Jonathan A., Gopinath, Deepak, Rosman, Guy, Sumner, Emily, Hakimi, Shabnam, Stent, Simon
To construct effective teaming strategies between humans and AI systems in complex, risky situations requires an understanding of individual preferences and behaviors of humans. Previously this problem has been treated in case-specific or data-agnostic ways. In this paper, we build a framework capable of capturing a compact latent representation of the human in terms of their behavior and preferences based on data from a simulated population of drivers. Our framework leverages, to the extent available, knowledge of individual preferences and types from samples within the population to deploy interaction policies appropriate for specific drivers. We then build a lightweight simulation environment, HMIway-env, for modelling one form of distracted driving behavior, and use it to generate data for different driver types and train intervention policies. We finally use this environment to quantify both the ability to discriminate drivers and the effectiveness of intervention policies.
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > California > Santa Clara County > Los Altos (0.04)
- Asia > Middle East > Republic of Türkiye > Karaman Province > Karaman (0.04)
- Transportation > Ground > Road (0.88)
- Leisure & Entertainment > Games (0.66)
- Information Technology > Artificial Intelligence > Robots (0.94)
- Information Technology > Data Science > Data Mining (0.68)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)
Learning When and What to Ask: a Hierarchical Reinforcement Learning Framework
Nguyen, Khanh, Bisk, Yonatan, Daumé, Hal III
Reliable AI agents should be mindful of the limits of their knowledge and consult humans when sensing that they do not have sufficient knowledge to make sound decisions. We formulate a hierarchical reinforcement learning framework for learning to decide when to request additional information from humans and what type of information would be helpful to request. Our framework extends partially-observed Markov decision processes (POMDPs) by allowing an agent to interact with an assistant to leverage their knowledge in accomplishing tasks. Results on a simulated human-assisted navigation problem demonstrate the effectiveness of our framework: aided with an interaction policy learned by our method, a navigation policy achieves up to a 7x improvement in task success rate compared to performing tasks only by itself. The interaction policy is also efficient: on average, only a quarter of all actions taken during a task execution are requests for information. We analyze benefits and challenges of learning with a hierarchical policy structure and suggest directions for future work.
- Oceania > Australia > Victoria > Melbourne (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > United States > Maryland (0.04)
- Europe > Portugal > Lisbon > Lisbon (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.84)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.55)
Learning to Perform Physics Experiments via Deep Reinforcement Learning
Denil, Misha, Agrawal, Pulkit, Kulkarni, Tejas D, Erez, Tom, Battaglia, Peter, de Freitas, Nando
When encountering novel objects, humans are able to infer a wide range of physical properties such as mass, friction and deformability by interacting with them in a goal driven way. This process of active interaction is in the same spirit as a scientist performing experiments to discover hidden facts. Recent advances in artificial intelligence have yielded machines that can achieve superhuman performance in Go, Atari, natural language processing, and complex control problems; however, it is not clear that these systems can rival the scientific intuition of even a young child. In this work we introduce a basic set of tasks that require agents to estimate properties such as mass and cohesion of objects in an interactive simulated environment where they can manipulate the objects and observe the consequences. We found that deep reinforcement learning methods can learn to perform the experiments necessary to discover such hidden properties. By systematically manipulating the problem difficulty and the cost incurred by the agent for performing experiments, we found that agents learn different strategies that balance the cost of gathering information against the cost of making mistakes in different situations. We also compare our learned experimentation policies to randomized baselines and show that the learned policies lead to better predictions.
- North America > United States > Texas > Travis County > Austin (0.04)
- North America > United States > California > Alameda County > Berkeley (0.04)